Fast Compressed Self-Indexes with Deterministic Linear-Time Construction
نویسندگان
چکیده
We introduce a compressed suffix array representation that, on a text T of length n over an alphabet of size σ, can be built in O(n) deterministic time, within O(n log σ) bits of working space, and counts the number of occurrences of any pattern P in T in time O(|P |+log logw σ) on a RAM machine of w = Ω(logn)-bit words. This new index outperforms all the other compressed indexes that can be built in linear deterministic time, and some others. The only faster indexes can be built in linear time only in expectation, or require Θ(n logn) bits. We also show that, by using O(n log σ) bits, we can build in linear time an index that counts in time O(|P |/ logσ n + logn(log logn)2), which is RAM-optimal for w = Θ(logn) and sufficiently long patterns. 1998 ACM Subject Classification E.1 Data Structures; E.4 Coding and Information Theory
منابع مشابه
Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time
We show that the compressed suffix array and the compressed suffix tree of a string T can bebuilt in O(n) deterministic time using O(n log σ) bits of space, where n is the string length andσ is the alphabet size. Previously described deterministic algorithms either have a constructiontime that depends on the alphabet size or need ω(n log σ) bits of working space. ∗Cheriton School of...
متن کاملClosing in on Time and Space Optimal Construction of Compressed Indexes
Fast and space-efficient construction of compressed indexes such as compressed suffix array (CSA) and compressed suffix tree (CST) has been a major open problem until recently, when Belazzougui [STOC 2014] described an algorithm able to build both of these data structures in O(n) (randomized; later improved by the same author to deterministic) time and O(n/ log σ n) words of space, where n is t...
متن کاملNew Construction of Deterministic Compressed Sensing Matrices via Singular Linear Spaces over Finite Fields
As an emerging approach of signal processing, not only has compressed sensing (CS) successfully compressed and sampled signals with few measurements, but also has owned the capabilities of ensuring the exact recovery of signals. However, the above-mentioned properties are based on the (compressed) sensing matrices. Hence the construction of sensing matrices is the key problem. Compared with the...
متن کاملDeterministic construction of Fourier-based compressed sensing matrices using an almost difference set
In this paper, a new class of Fourier-based matrices is studied for deterministic compressed sensing. Initially, a basic partial Fourier matrix is introduced by choosing the rows deterministically from the inverse discrete Fourier transform (DFT) matrix. By row/column rearrangement, the matrix is represented as a concatenation of DFT-based submatrices. Then, a full or a part of columns of the c...
متن کاملLinear-time string indexing and analysis in small space
The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017